Skip to content

Isolate core runtime and add headless foundation#118

Open
provencher wants to merge 2 commits into
mainfrom
feature/core-isolation-headless-foundation
Open

Isolate core runtime and add headless foundation#118
provencher wants to merge 2 commits into
mainfrom
feature/core-isolation-headless-foundation

Conversation

@provencher

Copy link
Copy Markdown
Collaborator

Summary

  • extract platform-neutral workspace, prompt, code-map, filesystem, security, and MCP contracts into RepoPromptCore, with narrow macOS and app composition adapters
  • add the standalone fail-closed repoprompt-headless host/CLI, packaging/install flows, direct-stdio MCP tools, and shared-runtime characterization fixtures
  • preserve app behavior and performance with shared capture/projection paths, bounded search concurrency, cache/coherence handling, and runtime diagnostics
  • harden security and lifecycle behavior with descriptor-safe snapshots/exports, locked state transactions, strict root policies, MCP session identity, cancellation, and joinable connection shutdown
  • add architecture guardrails, coordinated validation lanes, parity tests, and headless smoke coverage

Validation

  • make dev-format
  • make dev-lint
  • make guardrails
  • make conductor-selftest
  • make dev-swift-build PRODUCT=RepoPrompt
  • make dev-swift-build PRODUCT=repoprompt-mcp
  • make dev-swift-build PRODUCT=repoprompt-headless
  • make dev-test
  • make dev-headless-smoke
  • contribution preflight in both commit and push modes, including staged/outgoing secret scans

The post-rebase validation above ran against origin/main at 2544c83. A non-disruptive live-app smoke was attempted, but no CE app/MCP server was running; the app was intentionally not launched or disturbed.

@najibninaba

Copy link
Copy Markdown
Collaborator

Release-mode build regression found while testing local production install.

Repro:
CONFIRM_LOCAL_PRODUCTION_INSTALL=1 make dev-install-local-production

This fails in release compile on this PR with:

  1. ApplicationSecurity.swift: ptrace is no longer in scope.
    origin/main imports it via RepoPrompt-Bridging-Header.h, but this PR removes the -import-objc-header setting from the RepoPrompt target while keeping the release-only anti-debugging call.

  2. RepoPromptCoreRuntimeAliases.swift: RepoPromptCore.DeferredReplayBufferDiagnostics is aliased unconditionally, but the defining type only exists under #if DEBUG.

I checked origin/main at 2544c83 with:
Scripts/run_without_github_tokens.sh swift build -c release --product RepoPrompt

That release build succeeds, so this appears PR-introduced.

Minimal local fix that allowed local production install to complete:

  • Declare the C ptrace symbol directly in ApplicationSecurity.swift, or otherwise preserve an explicit import path for it after removing the bridging header.
  • Wrap the DeferredReplayBufferDiagnostics compatibility alias in #if DEBUG.

@najibninaba

Copy link
Copy Markdown
Collaborator

I hit a blocker while validating a local production build from PR #118 using the MCP CLI connected to the running app.

Repro:

  1. Bind a multi-root workspace through the MCP CLI.
  2. Call manage_selection set or manage_selection add with valid file paths.
  3. The mutation response shows the expected selected files and a non-zero token count.
  4. Immediately call manage_selection get or workspace_context.

Expected: the selection is still visible in the bound context.

Actual: readback comes back empty, with 0 total tokens.

It looks like the selection is built for the mutation response, then not persisted into the bound app/tab state used by readback/export.

I haven’t found the exact cause yet, so I’m not saying PR #118 introduced this. But I did hit it while testing a production build from this PR, and it blocks the rest of the MCP validation until we understand why selection writes don’t survive readback.

@dah

dah commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

For what it's worth, I've been working with this branch all day today and not had any problems.

@erauner12 erauner12 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I left some questions. Have not reviewed all of the code. At this point, I'm just trying to understand how it works and the direction you are considering for it.

...

logged an issue here with some thoughts as well:
#133

Registration(name: "prompt", capability: .safeProfile)
]

static let blockedCapabilities: [String: Capability] = [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m trying to understand the intended layering here. Is HeadlessToolRegistry meant to own headless tool policy long term, or is the plan for it to eventually derive from a shared/core catalog?

I ask because there seem to be a few related policy vocabularies now: this registry, the app MCP capability map, and the Core session/capability types. As headless grows, it feels like app MCP/headless/CLI-facing surfaces may want to ask the same shared model: "for this runtime/profile, which tools are advertised, blocked, gated, or implemented?"

Maybe the eventual shape is a shared catalog entry with tool name, capabilities, profile availability, implementation availability, and safety contract, with each adapter filtering from that instead of maintaining its own list.

try await assertError(fixture.server, frame: initializeRequest(id: 5), code: -32600)
}

func testRequestOnlyNotificationsAreIgnoredWithoutExecutingTools() async throws {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lifecycle tests are useful. I’m curious whether we should also add a small set of public-contract tests for the headless tool surface itself.

For example, instead of only testing server lifecycle mechanics, maybe we could lock down:

  • the exact default tools/list surface
  • broad categories that remain unavailable until intentionally supported
  • the response shape for blocked/unavailable tools
  • fail-closed behavior when permissions change but a tool still has no registered implementation

That would make the headless automation surface easier to evolve without accidentally changing what external callers can rely on.

@baron

baron commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Expert review — core isolation + headless foundation

This review covered the full merge-base diff (2544c83..3dd01a3, 506 files, +51,013/−14,410) with parallel deep-dives into six areas: the RepoPromptCore extraction, the headless host, security-critical surfaces (PCRE2/path containment/state files), the app MCP runtime, the app-side rewiring (view models/projection adapters), and build/scripts/guardrails. Every finding below was verified against the checked-out PR head; line numbers refer to 3dd01a3.

Overall: this is unusually disciplined work for its size — the containment design in the headless host is genuinely fail-closed (per-component openat/O_NOFOLLOW walks, fstat-after-open fingerprinting, locked atomic state transactions), the parity suites pin real pre-change golden bytes rather than testing the new code against itself, the PCRE2 "new" code is a byte-identical relocation (same git blob), and the layering guardrails actually enforce the documented architecture. The issues that matter are concentrated at a handful of seams. I'd treat findings 1–3 as merge-blocking and 4–10 as needing explicit sign-off or a fast follow.


Blocking / major

1. Codex readOnlyHint projection silently dropped from tools/list
Sources/RepoPrompt/Infrastructure/MCP/MCPConnectionManager.swift:6973
At merge-base, handleListTools emitted annotations: CodexMCPToolAnnotationProjection.project(tool.annotations, clientIdentifier: clientIdentifier), which strips readOnlyHint for identified Codex clients (rationale documented in CodexMCPToolAnnotationProjection.swift:3-7). The rewritten indexed-snapshot loop emits annotations: tool.annotations raw. The projection enum is now referenced only by tests, and the clientIdentifier local at :6864 is unused — strong evidence of an accidental drop during the loop rewrite. Recommendation: reapply the projection in the snapshot loop, or if intentional, delete the enum + tests and call it out in the PR description. A characterization test freezing the wire shape per client would prevent recurrence.

2. Guardrails pin commit objects that exist only on a side branch — squash/rebase merge or branch deletion breaks make guardrails permanently
Scripts/test_shared_runtime_phase2_boundaries.py:16 (PHASE0_ARTIFACT_BASELINE = "48a335e"), Scripts/test_shared_runtime_phase0_characterization.py:85-86, Tests/SharedRuntimeConvergenceFixtures/Phase0/manifest.json:3-4
Verified: 48a335e ("Freeze shared runtime parity baseline") and freeze_head 487cd71d… are not ancestors of this PR's head or of main's merge-base — they live only on origin/core_split. make guardrails (run by CI, the contribution preflight, and ./conductor guardrails) does git ls-tree/git show 48a335e:path on every invocation, so it works only in clones that happen to have fetched core_split with full history. Consequences: deleting core_split breaks guardrails for everyone; shallow or single-branch clones fail today; the manifest also hard-asserts "branch": "core_split". The CI fetch-depth: 0 fix in the top commit is correct but symptomatic — it fixes only CI. Recommendation: replace the git-revision pins with checked-in content hashes of the frozen fixture files (the pattern already used by Scripts/Fixtures/shared-runtime-headless-reviewed.sha256); at minimum use full 40-char SHAs, drop the branch-name assertion, and document the full-history requirement.

3. Queued MCP tool calls now hard-fail on benign catalog churn
MCPConnectionManager.swift:7859-7871, 7918; ServiceRegistry.swift:127-134; invalidation fan-out at MCPServerViewModel.swift:1654-1665, 2072-2075 and WindowRoutingService.swift:3094
ensureIndexedRouteStillInvocable() throws invalidParams whenever the captured route's catalog revision is stale. The route snapshot is captured before limiter.withPermit (:7230 vs :7268), so any call queued behind a long-running call spans the window — and invalidation triggers are broad (any ToolAvailabilityStore.$toolSummaries change from settings toggles, window open/close, or startup registration churn bumps every window's revision). Previously such calls dispatched against the live service and succeeded; now identical-tool catalog churn surfaces as client-visible errors. Recommendation: on isCurrent failure, re-resolve the tool by canonical name against the current committed snapshot and only fail if the tool is genuinely gone or changed for the bound window — or narrow invalidateCatalog so unrelated summary changes don't bump unaffected windows.

4. MCPService.fullShutdown silently drops a join that races the shutdown
Sources/RepoPrompt/Infrastructure/MCP/MCPService.swift:182-187, 286-298
While terminalShutdown is held across the awaited settle/shutdown, reconcileParticipation returns early without recording desire, and latestParticipationRequestByWindow.removeAll() erases the request. A window's join() returns success, windowToolsEnabled stays true, but the listener stays down until the user toggles again. The replaced code explicitly preserved participation registered during suspended shutdown. Two overlapping fullShutdown calls can additionally desynchronize state.isRunning from a running listener, after which leave no-ops (stop() guard at :156) and leaks a participant-less listener. Tests cover join-after-fullShutdown but not join-during. Recommendation: queue participation requests received during terminal shutdown and re-run settleListenerState() after clearing the flag; serialize fullShutdown.

5. MainActor.assumeIsolated in observation-token deinits is a latent crash
Sources/RepoPromptCore/Workspaces/WorkspaceSessionController.swift:99-101, Sources/RepoPromptCore/WorkspaceContext/Selection/WorkspaceSelectionController.swift:16-18
deinit of a @MainActor class is nonisolated and runs on whatever thread drops the last reference; if a token is ever captured by a Task or background-released container, assumeIsolated traps. These tokens are package-visible API, so the invariant is unenforceable at the type level. Recommendation: make the cancel action @Sendable (capturing only [weak controller, id]) and hop via Task { @MainActor in … } from deinit, or require explicit cancel() with a DEBUG-only assertion.

6. New suspension point in materializeCatalogRegularFile can demote a discoverable file to managed-only
Sources/RepoPromptCore/WorkspaceContext/WorkspaceFileContextStore.swift:3388-3430 (await at :3411), indexFile post-block at :4198-4204
During the move, the on-disk existence check became await service.regularFileExistsOnDisk(...), introducing an actor suspension after the "already in catalog" early-return check (:3395). If another task indexes the same path as discoverable during the await, this call falls through to indexFile(..., managedOnly: true), which unconditionally inserts the pre-existing discoverable file's ID into managedOnlyFileIDs — hiding it from scans/tree/search with no removal event and no guaranteed re-promotion. The pre-move synchronous code had no such window. Recommendation: re-check file(rootID:relativePath:) after the await and route through the existing-file branch.

7. Selection-revision recording dropped for direct UI snapshot commits
Sources/RepoPrompt/Features/Workspaces/ViewModels/WorkspaceManagerViewModel.swift:2490-2516 (and DEBUG-only recorder at :215)
The old code allocated a selection revision on every committed selection change; those revisions drive WorkspacePersistenceWriter's stale-payload protection (Sources/RepoPromptCore/Workspaces/WorkspacePersistenceWriter.swift:329-420). Revisions are now allocated only when the publish goes through WorkspaceSelectionCoordinator.flushPendingUISelectionToActiveTab. Direct callers of publishActiveComposeTabSnapshot(commitToMemory: true)AgentModeViewModel.swift:11690/13581/14306/14341, ContextBuilderAgentViewModel.swift:2605/2820, PromptViewModel.swift:3751 — commit live selection with no revision bump, so under cross-window/coalesced write races the writer can classify the newer selection as stale or merge an older record over it. Recommendation: give updateComposeTabFastNoDirty a no-dirty + records-revisions option, or route all direct publishes through the coordinator.

8. Headless: blocking stdin read on the cooperative pool (deadlock risk)
Sources/RepoPromptHeadless/MCP/HeadlessStdioTransport.swift:17
run() calls FileHandle.standardInput.availableData (blocking syscall) inside an async loop, pinning a cooperative-pool thread for the server's lifetime. On a width-1 pool a tools/call can never execute or respond — a hard deadlock; on multicore it permanently consumes one of N threads while concurrent tool execution (HeadlessMCPServer.swift:284-291) competes for the rest. Recommendation: read stdin on a dedicated thread/DispatchQueue (or DispatchIO) feeding an AsyncStream<Data>.

9. Headless: no SIGPIPE handling + exception-raising stdout writes
Sources/RepoPromptHeadless/MCP/HeadlessStdoutWriter.swift:7-9, Support/HeadlessOutput.swift:15, main.swift
Nothing ignores SIGPIPE, and output uses the legacy FileHandle.write(_:) which raises an ObjC exception on I/O error. If the client closes the pipe while the server flushes (e.g. the EOF path in HeadlessStdioTransport.finish() writes -32800 cancellations after stdin EOF), the process dies via SIGPIPE or an uncatchable exception instead of shutting down cleanly. There's also no SIGINT/SIGTERM handling. Recommendation: signal(SIGPIPE, SIG_IGN) at startup, use write(contentsOf:)/raw write(2) with EPIPE-as-quiet-terminate, add signal-driven graceful shutdown.

10. Main-actor perf regressions on prompt copy/estimate paths

  • PromptViewModel.swift:5346-5414: buildClipboardPayload now runs CodeMapExtractor.generateFileTree (:5377) and buildLocalDefinitionBlockIfNeeded (:5393) synchronously on the main actor (old path ran off-main via PromptContextPreAssemblyService.resolve). Affects copyToClipboard() (:3857) and token estimation (:5470/:5476) — large workspaces will hitch the UI.
  • PromptViewModel.swift:5490-5504: the ChatContextTokenBaselineCache was deleted; calculateTokensForChatContext now rebuilds the full chat payload per call, on the main actor, invoked per Oracle send (OracleViewModel.swift:423) and from MCP (MCPPromptContextToolProvider.swift:235). (The estimate-basis change from count/4 to exact-rendered utf8/4 × 1.05 is pinned by parity tests and looks intentional, but reported chat numbers shift ~5-10% — worth a release-note line.)
    Recommendation: hoist the heavy string assembly off-main as before and reintroduce a payload-level memo.

Needs sign-off / minor (grouped)

Projection & token accounting

  • WorkspaceContextProjectionService.swift:266-303, 630-635 + PromptContextAccountingService.swift:177-191: the otherwise fail-closed validation layer never checks capture.coverage. A .projection(codemapCoverage: .referenced) capture fed to a codeMapUsage == .complete request silently iterates truncated snapshots and under-reports "complete" estimates. Throw a captureCoverageInsufficient error instead.
  • TokenCountingViewModel.swift:545-554, 601+: a heavy recount that fails twice returns silently with pendingDirty already consumed — published counts stay stale until the next user-driven dirty. Re-union the consumed kinds on final failure.
  • TokenCountingViewModel.swift:944, 961: asserts on core duplicate-occurrence disagreements will crash debug builds; prefer recoverable diagnostics.

Workspace persistence

  • WorkspaceManagerViewModel.swift:84-104, ~5325: legacy-format normalization write-back was deleted (requiresRewrite feeds only DEBUG counters), so untouched legacy workspace.json files re-normalize on every launch and never converge on disk.
  • WorkspaceManagerViewModel.swift:3271: first poll no longer saves unconditionally (generations start equal); verify restore/first-launch flows still persist captured tab snapshots.
  • PromptViewModel.swift:2692, 2695, 2723 (+ :3068/:3081): closeComposeTabs early-return paths skip the trailing stashed-tab cascade deletion after MCP/git cleanup has already run, and a failed blank-tab creation aborts with tabs never removed. Commit removal before the failable step.
  • WorkspacePersistenceWriter.swift:211-215: the atomic disk write runs actor-isolated, serializing all writer operations behind each write; move it into the existing detached task.
  • WorkspaceFileContextStore.swift:884-931: captureWorkspaceFileContext's while true identity-retry loop has no cap/backoff; sustained churn can livelock the caller.
  • WorkspaceRepository.swift:103-112, 246-249: decode cache accumulates one entry per (path, mtime) for externally-modified workspaces; evict same-path keys on load.

App MCP runtime

  • MCPConnectionManager.swift:6876-6878 + MCPToolCatalogReadiness.swift:123-133: tools/list for nil-window connections changed from proceed-on-timeout to fail-closed internalError, and readiness now requires a committed async snapshot — early-startup multi-window clients can newly error for up to 2s. If intentional, confirm known clients retry.
  • MCPServerViewModel+SelectionReply.swift:82-84: try? swallows accounting failures into empty token stats with no log.
  • PersistentMCPDistinctConnectionConcurrencyTests.swift:167-178 and MCPRuntimeRegistryTests.swift:193-200 grep source text for literal strings — they pass against behaviorally broken refactors and fail against correct ones. Replace with behavioral assertions (the debugConnectionCleanupState surface already exists). Related: the unbounded poll loops at PersistentMCPDistinctConnectionConcurrencyTests.swift:54-77 and MCPConnectionManager.swift:6153-6158 (debugRemoveConnection) hang the suite on regression instead of failing.

Headless host

  • HeadlessSecureFileAccess.swift:95-126: hard links inside a root to out-of-root regular files read successfully; the state-file layer already requires st_nlink == 1 (HeadlessStateFileSecurity.swift:168) — apply the same check (or document the limitation).
  • HeadlessExternalExportFileSecurity.swift:90: the external-export ancestor is opened without O_NOFOLLOW/per-component walk, unlike the in-state path (HeadlessStateFileSecurity.swift:302) — a narrow TOCTOU under the export_outside_state_directory permission.
  • No test anywhere exercises out-of-root rejection through the resolver/tool surface (absolute /etc/hosts, Root/../sibling, nested-root longest-match in HeadlessPathResolver.swift:82-97). Given "fail-closed" is the headline claim, the primary lexical gate deserves direct tests.
  • HeadlessToolRegistry.swift:67: try? turns a corrupt config into a successful empty tools/list instead of an actionable error.
  • HeadlessFileLock.swift:25: flock(LOCK_EX) blocks indefinitely on every request path (and occupies a pool thread — compounds finding 8). Use LOCK_NB + bounded retry with a clear error.
  • HeadlessSearchService.swift:471-474: one >64 KiB line exhausts the whole search budget instead of skipping the subject.
  • HeadlessFileCatalog.swift:56, 221: search/tree skip hidden files but read_file resolves them — agents will conclude dotfiles don't exist yet can read them. Make the policy uniform or filterable.
  • docs/architecture/headless-core.md:139 documents secure-storage behavior that isn't implemented (HeadlessSecureStorage.makeService() is dead code); soften the doc or wire it.

Build / guardrails / tooling

  • Scripts/swift_style.sh:32 now formats Sources/RepoPromptHeadless, but that tree is byte-locked by the reviewed sha256 manifest verified by guardrails — a routine make dev-format breaks guardrails until the baseline is regenerated. Document the regeneration step next to the format workflow or exclude the locked trees.
  • Scripts/conductor.py:920-921: the daemon guardrails op now runs make guardrails (which invokes swift package dump-package) but claims no lanes — it can contend with a concurrent coordinated build. Claim the build lane or use an isolated --scratch-path.
  • Scripts/core_boundary_guardrails.sh:82: RepoPromptCoreMacOS has no UI-import guardrail — it could grow import AppKit without tripping anything. Add the same check Core and Headless get.
  • Makefile:99-100: dev-guardrails is wired but undocumented in AGENTS.md alongside the other new aliases.
  • PCRE2Regex.swift:140-158: firstMatch/enumerateMatches default matchLimits: nil (compile-time ~10M defaults, not the hardened RepoPromptPCRE2MatchPolicy caps). Pre-existing behavior preserved by the move, but defaulting public entry points to conservative limits would close the remaining ReDoS gap.
  • MacOSFSEventsWatcher.swift:192, 212: failure reporting via print to stdout from library code (stdout may be a protocol channel for embedding hosts); route through the diagnostics sink. Also :76-79 calls FSEventStreamFlushSync after Stop (inherited ordering bug; flush-then-stop or delete).

Verified sound (worth keeping as-is)

  • Containment design (headless): lexical standardization + symlink re-resolution in HeadlessPathResolver, then descriptor-relative openat/O_NOFOLLOW/fstat walks; state files get owner/mode/nlink/anchoring validation with atomic same-directory renames; search is comprehensively budgeted (time/bytes/files/PCRE2 match/depth/heap, NUL binary detection). Tests genuinely exercise TOCTOU swaps, FIFOs, symlinks, oversized frames.
  • PCRE2 move: Sources/CSwiftPCRE2/CSwiftPCRE2.c is byte-identical to merge-base (same blob); no new C-interop surface. Vendoring is pre-existing and license-tracked.
  • MCP peer identity: only the kernel-reported LOCAL_PEERPID is trusted (MCPAppProxyTransportBoundary.swift:10-36); handshake-claimed PID is diagnostic-only; connections without a trusted PID are rejected.
  • Transport lease lifecycle (MacOSBootstrapAcceptedTransportLease.swift): closes the fd exactly once across every traced interleaving, including close-during-publish and IO-lease-vs-shutdown races, with deterministic test coverage.
  • Parity suites pin real legacy behavior: AIMessagePromptAssemblyParityTests compares against the still-compiled .legacy strategy and historical byte templates; PromptRenderingParityCharacterizationTests uses hardcoded goldens. Fixtures are deterministic (synthetic timestamps/UUIDs/paths — no machine-local data).
  • Package graph: Core/CoreMacOS/Headless have zero AppKit/SwiftUI/Combine imports (grep-verified); removal of the -import-objc-header unsafe flags is a real improvement and is guardrail-enforced.
  • TokenCalculationService: the generation-guarded task handoff actually fixes a pre-existing clobbering bug; the stopWatchingRoot detach→drain→flush→close ordering is careful; shell scripts are strict-mode, fully quoted, atomic-staged, sudo-from-TTY-only, idempotent.

Review method: six parallel subsystem deep-dives over the merge-base diff at 3dd01a3, each finding verified against the checked-out source before inclusion; the three highest-severity claims (findings 1, 2, 5) were independently re-verified. No builds or tests were run as part of this review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants